Showing posts with label PowerShell. Show all posts
Showing posts with label PowerShell. Show all posts

Powershell: How Can I Determine Local Path from UNC Path?


The Problem
Usually we know the UNC path(like \\server\share\file_path), we need get the local physical path, so we can login to that machine, go to that path and make some change.

The Solution
We can use WMI(Windows Management Instrumentation) to get and operate on windows management information. Win32_Share represents a shared resource on a computer system running Windows.

In Powershell, we can use get-wmiobject  to get WIM class and -filter to specify the share name.

So now the solution is obvious: just one command line:
get-wmiobject -class "Win32_Share" -namespace "root\cimv2" -computername "computername" -filter "name='uncpath'" | select name,path

Output:
name                path
----                   ----

share-name        e:\users\somefolder
Reference
get-wmiobject
Win32_Share

Powershell and Java: Stumble by UTF8 BOM or without BOM


The Problem
When I am writing my powershell script to clean csv file to remove invalid records: I mistakenly add -encoding utf8 when using out-file to write response to the final csv.

Then I run the following command to import the csv file to Solr:
http://localhost:8080/solr/update/csv?literal.f1=1&literal.f2=2&&header=true&stream.file=C:\1.csv&literal.cistate=17&commit=true
It will generate a unique id: docid by concatenating f1, f2, and the first column of the csv file: localId.
But to my surprise, there is only one document in solr with docid: 12.
http://localhost:8080/solr/cvcorefla4/admin/luke, it shows:
 <int name="numDocs">1</int> <int name="maxDoc">16420</int> <int name="deletedDocs">16419</int> <long name="version">2521</long> 


Run http://localhost:8080/solr/select?q=*, and copy the response to a new file in notepad++ with encoding utf8, everything seems fine, but when I change the file encoding to ascii, it looks like below:
 <str name="docid">12</str> <arr name="id"> <str>f0e662cefe56a31c6eec5d53e64f988d</str> </arr> 
Notice the messed invisible character before id: id. -  Also the field is not expected string, but array of string. So I write one simple java application to view the real content in "id": 
 public void testUnicode() { String str = "id"; for (int i = 0; i < str.length(); i++) { System.out.println(str.charAt(i)); System.out.println((int) str.charAt(i)); System.out.println(escapeNonAscii(str.charAt(i) + "")); } System.out.println("***************"); System.out.println(str.length()); System.out.println(str.hashCode()); System.out.println(escapeNonAscii(str)); System.out.println("***************"); } private static String escapeNonAscii(String str) { StringBuilder retStr = new StringBuilder(); for (int i = 0; i < str.length(); i++) { int cp = Character.codePointAt(str, i); int charCount = Character.charCount(cp); if (charCount > 1) { i += charCount - 1; // 2. if (i >= str.length()) { throw new IllegalArgumentException("truncated unexpectedly"); } } if (cp < 128) { retStr.appendCodePoint(cp); } else { retStr.append(String.format("\\u%x", cp)); } } return retStr.toString(); } 
The invisible prefix is \ufeff. U+FEFF is byte order mark (BOM).  So now the problem is kind of obvious: out-file -encoding utf8 it is actually using utf-8 with BOM. But java uses utf8 without bom to read file. This causes the problem: to java the first column in first line is: \ufefflocalId not localId.
The Solution
Actually the fix is simple: the default encoding of out-file is Unicode: which works fine with java. If we are sure all code is in the ascii range, we can also specify -encoding ascii.
 
Resource Byte order mark Unicode Character 'ZERO WIDTH NO-BREAK SPACE' (U+FEFF) 

Powershell-Working with CSV: Delete Rows Without Enough Columns


The Problem
We import csv files into Solr server, which is very sensitive with the number of columns. If there is no enough columns, Solr will fail with exception:
org.apache.solr.common.SolrException: CSVLoader: input=file:/C:/1.csv, line=9158554,expected 19 values but got 17

So we would like to have a script to clean the csv: to remove the rows which have no enough data: the number of columns is not 19.

Don't know how to get the number of columns of current record, but it's easier to check whether the value of last field is null: this means exactly no enough columns.

The Solution: Using Powershell
Powershell command would be like below(- the last field is to):

Import-Csv .\1.csv | Where-Object { $_.to -ne $null} | Export-Csv .\rst1.csv -NoTypeInformation


To output which line has no enough columns:
Import-Csv .\1.csv| Foreach-Object {$line = 0} { if($_.bcc -eq $null) { echo "ignore line: $line, no enough fields"; } else { convertto-csv -inputobject $_ -NoTypeInformation | select -Skip 1 | out-file -filepath .\r1.csv -Append } $line++ } 
The complete script: 
cleanCsv.ps1 Usage: .\cleanCsv.ps1 -filePath .\1.csv -destFilePath .\r1.csv
[CmdletBinding()] Param( [Parameter(Mandatory=$True)] [string]$filePath, [Parameter(Mandatory=$True)] [string]$destFilePath, [Parameter(Mandatory=$False)] [string]$lastField="bcc" ) # $ignoreLine = 2323533; Get-Date -format "yyyy-MM-dd HH:mm:ss" $sw = [Diagnostics.Stopwatch]::StartNew() If (Test-Path $destFilePath ){ echo "remove old $destFilePath" Remove-Item $destFilePath } gc $filePath -TotalCount 1 | out-file -filepath $destFilePath Import-Csv $filePath | Foreach-Object {$line = 0} { if($_.$lastField -eq $null) { echo "ignore line: $line, no enough fields"; } else { convertto-csv -inputobject $_ -NoTypeInformation | select -Skip 1 | out-file -filepath $destFilePath -Append } $line++ } $sw.Stop() Get-Date -format "yyyy-MM-dd HH:mm:ss" echo "took " $sw.Elapsed 

Using Powershell Get-ChildItem


List All Java Files in All Subfolders
gci -Recurse -filter *.java | % { $_.FullName }

Use gci | gm to check the type of $_ and its properties and methods
$_ is TypeName: System.IO.FileInfo, and has properties like Name,BaseName,FullName, Length.

Miscs
If we want to show all hidden or system files, use -Force

Run Commands Faster in PowerShell


In Linux, we can use ! to execute commands faster. such as !! or !-1 or up arrow to execute last command, use !prex to run last command that starts with a specific word.

We can do same thing in PowerShell.

Get-History: alias h
Invoke-History: alias r
Call r to execute last command.
Call r prefix to execute last command that starts with a specific word:
r ant               Run last ant command.
r "git push"        Run last git push command: notice if there is space in the prefix, we have to put them in double quotes.

Use get history to show id of commands, then run:
r id(for example r 3
The Invoke-History cmdlet accepts only a single ID, if we want to run multiple commands, run r 3; r 5
The Last Command:  $^
970X90

Run Multiple PowerShell in Tabs Mode
Use ConEmu to run multiple PowerShell in tabs mode
Another option is console2.

Resources
ConEmu - The Windows Terminal/Console/Prompt we've been waiting for?

PowerShell: Working with CSV Files


Background
When import csv file to solr, it may fail because the csv is in correct formatted: mostly related with double quotes in column value, or maybe there is no enough columns.

When this happens, we may have to dig into csv files. Powershell is a great tool in this case.
Task: Get Line Number of the CSV Record
When solr fails to import csv: it may report the following error:
SEVERE: Import csv1.csv failed: org.apache.solr.common.SolrException: CSVLoader: input=file:/C:/csv1.csv, line=134370,expected 19 values but got 17
                values={field_values_in_this_row}
Solr shows the error happens at 134370 line, but if we use Get-Content csv1.csv | Select-Object -index 134370, we may find content of 134370 line is totally different. This is because if there are multiline records in the csv file, the line number would be not correct.
 /** * ATTENTION: in case your csv has multiline-values the returned * number does not correspond to the record-number * * @return current line number */ public int org.apache.solr.internal.csv.CSVParser.getLineNumber() { return in.getLineNumber(); } 

To Get correct line of the csv record, use the following PowerShell command:
select-string -pattern 'field_values_in_this_row' csv1.csv | select Line,LineNumber
Line                                                                                              LineNumber
----                                                                                               ----------
field_values_in_this_row                                                                134378
Task: Get Record Number of CSV File
Users want to know whether all records are imported to csv. To do this, we need get number of all not-empty records in the csv file. Line number of the csv file is not useful, as ther may be empty lines , or multiple-lines records in the csv file.

We can use the following Powershell command: the Where-Object excludes empty records.
(Import-Csv csv1.csv | Where-Object { ($_.PSObject.Properties | ForEach-Object {$_.Value}) -ne $null} | Measure-Object).count

The previous command is slow, if we are sure there is no empty records(lines) in the csv file: we can use following command:
(Import-Csv .\csv1.csv | Measure-Object).count

Other CSV related PfowerShell Commands
Select fields from CSV file:
Import-Csv csv1.csv | select f1,f2 | Export-Csv -Path csv2.csv –NoTypeInformation
Add new fields into CSV file:
Import-CSV csv1.csv | Select @{Name="Surname";Expression={$_."Last Name"}}, @{Name="GivenName";Expression={$_."First Name"}} | Export-Csv -Path csv2.csv –NoTypeInformation
Import-Csv .\1.txt | select-object id | sort id –Unique | Measure-Object
Rescources
Import CSV that Contains Double-Quotes into Solr
Improve Solr CSVParser to Log Invalid Characters

PowerShell Tips: Get a Random Sample from CSV File


The Problem

I am trying to write and test R script against some data from customer. But the data is too big, it would take a lot of time to load the data and run the script. So it would be to extract a small fraction from the original data.

The Solution
First extract the first line from the original csv file, write to destination file.
Get-Content big.csv -TotalCount 1 | Out-File -Encoding utf8 sample.txt

Notice that by default Out-File cmdlet or redirection command >> uses system default encoding when write to a file. Most application by default uses utf-8 or utf-16 to read data. Hence we use -Encoding utf8 here.

Then we read all lines except the first line: Get-Content big.csv | where {$_.readcount -gt 1 }

Then randomly select 1000 lines and append them to the destination file.
Get-Content big.csv | where {$_.readcount -gt 1 } | Get-Random -Count 100 | Out-File -Encoding utf8 -Append sample.txt

The Complete Script
Get-Content big.csv -TotalCount 1 | Out-File -Encoding utf8 sample.txt; Get-Content big.csv | where {$_.readcount -gt 1 } | Get-Random -Count 100 | Out-File -Encoding utf8 -Append sample.txt

Related Script: Get default system encoding
[System.Text.Encoding]::Default
[System.Text.Encoding]::Default.EncodingName

Resource
PSTip: Get-Random
Get-Random Cmdlet

PowerShell in Action: Analyze Log and Interact with Solr


The Problem
Need write a program to analyze solr logs to check why some items local solr server fetches from remote solr server is missing. 
We suspect it's because of the deduplication configuration. Items that have same values for signature fields are marked as duplication and removed by Solr. But we need analyze the log and find all these items.
Why Use PowerShell?
1. Powershell is preinstalled with Win7, Windows Server 2008 R2 and later Windows release.
2. It's powerful, we can even call .Net in powershell script.
3. It's an interpreted language. Means we can easily change the script and run it. No need to compile and package as Java or .Net.
4. I have worked as a Java programmer for more than 6 years, it's kind of boring to write this program in Java, So why not try some new tool and learn something new:)
Analyze Log
In linux, we can use awk, grep to search and extract content and field from log.
In powershell, we use Get-Content and Foreach-Object. In Foreach-Object, we test whether current item(log) contains "Got id", if so, split it by white space, and get the third field, then write result to a temporary file.

Get-Content $logs | Foreach-Object{ if($_.Contains("Got id")) {$a=$_.Split()[3]; $a.Substring(0,$a.Length-1); } } | out-file ".\ids.txt" 
Interact with Solr
We then read 100 ids from the temp file, construct a url, then use Net.HttpWebRequest to send a http request, and use Net.HttpWebResponse and IO.StreamReader to read the http response.

In PowerShell 3.0 and newer, we can use Invoke-WebRequest to execute http request and parse response.

We then check ids in the response, if it doesn't exist in response. It means it is missing in Solr. We then save it to the result file.
$count=100 $ids=@() gc .\ids.txt | foreach {$i=0;} { $ids+=$_ $i++ if($i -eq $count) { checkSolr $ids; $ids=@(); $i=0;} } Function checkSolr ($ids) { $url=$solrServer+"/select?fl=contentid&omitHeader=true&q=" foreach ($id in $ids) {$url+="contentid:$id OR "} $url=$url.SubString(0, $url.length-3) [Net.HttpWebRequest] $req = [Net.WebRequest]::create($url) $req.Method = "GET" $req.Timeout = 600000 # = 10 minutes [Net.HttpWebResponse] $result = $req.GetResponse() [IO.Stream] $stream = $result.GetResponseStream() [IO.StreamReader] $reader = New-Object IO.StreamReader($stream) [string] $output = $reader.readToEnd() $stream.flush() $stream.close() # A foreach loop doesn't ouput to the pipeline. foreach ($id in $ids) { $idx = $output.IndexOf($id) if($idx -eq -1) { $notExistStream.WriteLine("$id not in solr"); } else { if("$existFile" -ne "" ){ $existStream.WriteLine("$id exist in solr") } } } } 
Complete Code
[CmdletBinding()] Param( [Parameter(Mandatory=$True,Position=1)] [String]$solrServer, [Parameter(Mandatory=$True,Position=2)] [String[]]$logs, [Parameter(Mandatory=$True)] [string]$notExistFile, [Parameter(Mandatory=$False)] [string]$existFile ) Function checkSolr ($ids) { $url=$solrServer+"/select?fl=contentid&omitHeader=true&q=" foreach ($id in $ids) {$url+="contentid:$id OR "} $url=$url.SubString(0, $url.length-3) [Net.HttpWebRequest] $req = [Net.WebRequest]::create($url) $req.Method = "GET" $req.Timeout = 600000 # = 10 minutes [Net.HttpWebResponse] $result = $req.GetResponse() [IO.Stream] $stream = $result.GetResponseStream() [IO.StreamReader] $reader = New-Object IO.StreamReader($stream) [string] $output = $reader.readToEnd() $stream.flush() $stream.close() # A foreach loop doesn't ouput to the pipeline. foreach ($id in $ids) { $idx = $output.IndexOf($id) if($idx -eq -1) { $notExistStream.WriteLine("$id not in solr"); } else { if("$existFile" -ne "" ){ $existStream.WriteLine("$id exist in solr") } } } } function createNewFile($file) { if(Test-Path -Path $file) { Remove-Item $file } New-Item $file -ItemType file $file=$(Resolve-Path $file).ToString() } Write-Host (Get-Date).tostring(), script started -BackgroundColor "Red" -ForegroundColor "Black" $elapsed = [System.Diagnostics.Stopwatch]::StartNew() Get-Content $logs | %{ if($_.Contains("Got id")) {$a=$_.Split()[3]; $a.Substring(0,$a.Length-1); } } | out-file ".\ids.txt" Write-Host (Get-Date).tostring(), created ids.txt -BackgroundColor "Red" -ForegroundColor "Black" $count=100 $ids=@() gc .\ids.txt | foreach {$i=0;} { $ids+=$_ $i++ if($i -eq $count) { checkSolr $ids; $ids=@(); $i=0;} } $notExistFile=createNewFile $notExistFile $notExistStream = [System.IO.StreamWriter] "$notExistFile" if("$existFile" -ne "") { createNewFile $existFile; $existStream = [System.IO.StreamWriter] "$existFile"; } # check for remaining ids checkSolr $ids; $notExistStream.close() if($existStream) {$existStream.close()} Write-Host (Get-Date).tostring(), script finished -BackgroundColor "Red" -ForegroundColor "Black" write-host "Total Elapsed Time: $($elapsed.Elapsed.TotalSeconds )" -BackgroundColor "Red" -ForegroundColor "Black" 
PowerShell GUI
PowerGUI

Managing Windows Services


In command line, we can use Tasklist to find the process the service is running.
Tasklist /FI "SERVICES eq serviceA"

Use Taskkill to kill the process the service is running. 
Taskkill /F /FI "SERVICES eq serviceA"

We can use sc command to query the service and delete the service.
sc query serviceA
If the service doesn't exists, it will show the following error:
[SC] EnumQueryServicesStatus:OpenService FAILED 1060:
The specified service does not exist as an installed service.

sc start serviceA
sc stop serviceA
sc delete serviceA

Or we can use net start|stop serviceA to start, stop a service.

In Window Batch script, we can use SC QUERY to query a service, and check ERRORLEVEL to determine whether a service exist:
SC QUERY serviceA > NUL
REM if ERRORLEVEL 1060 means it doesn't exists
IF ERRORLEVEL 1060 (
echo service doesn't exist
)

Manage Service in PowerShell
get-service serviceA
start-service serviceA
stop-service serviceA
restart-service serviceA

(Get-WmiObject Win32_Service -filter "name=''").Delete()
$svc = gwmi win32_service -filter "name='alerter'"
$svc.delete()

Resources
SC Command

Using Mergeindexes and PowerShell to Automate Deployment of Solr Index to Remote Production Machines


The Problem
We make change to our documentation site periodically, and use Nutch to crawl it and save index to solr server in local build machine and test it. 

On release date, we will deploy the new index into production machines. We want to minimize the downtime, so we can't restart Solr server in production machines.
The Solution
Luckily, Solr provides mergeindexes tool: it doesn't support merge remote indexes, but we can easily use Powershell to copy the new index to production machines, then run mergeindexes locally.

The reason we choose Window PowerShell is because PowerShell supports UNC path like(\\serverA\labelB\pathc), which Window batch doesn't support.
Steps and Script
1. Crawl vendorA doc to core core_vendorA in build machine.
PowerShell Script
This step is optional, as the site should be already crawled and tested before deploy. We include the script here for completeness.
We create a ServerResource in Nutch side to expose http API to start/stop/edit/delete a task to crawl a site and monitor crawl status. Please refer Nutch2: Extend Nutch2 to Crawl via Http API

$data = '{\"solrURL\":\"http://solrServerInbuildMachine/solr/vendorA/\",\"crawlID\":\"crawl_vendorA_ID1\", \"taskName\":\"taskl_vendorA_ID1\",\"crawlDepth\":2,\"urls\":[\"http://docsite:port/rootpath/\"], \"includePaths\":[\"+^(?i)http://docsite:port/rootpath/\"],\"subCollections\":[{\"name\":\"vendorA\", \"id\":\"vendorA\",\"whiteList\":[\"http\",\"cifs\",\"file\",\"ftp\"]},{\"name\":\"vendorA\", \"id\":\"vendorA\",\"whiteList\":[\"http\",\"cifs\",\"file\",\"ftp\"]}],\"solrindexParams\":\"update.chain=webCrawlerChain\", \"delOldDataQuery\":\"subcollection:vendorA\",\"sync\":true,\"deleteIfExist\":true,\"updateDirectly\":false, \"tmpCoreName\":\"core-tmp1\",\"cleanData\":true,\"startTask\":true,\"reuseIfExist\":false, \"fileToFileMappings\":{\"conf/nutch-site.xml\":\"conf/predefinedTasks/nutch-site-templateA.xml\"}} ' &curl -X PUT -H "Content-Type: application/json" -d $data http://nutchServer:port/nutch/cvcrawler 
2. After crawl is finished, copy and zip the index folder solr\data\core_vendorA\index to production machines, folder: %PREFIX%\new-index\core_vendorA\index, and unzip it.
PowerShell Script
if (-not (test-path "$env:ProgramFiles\7-Zip\7z.exe")) {throw "$env:ProgramFiles\7-Zip\7z.exe needed"} set-alias sz "$env:ProgramFiles\7-Zip\7z.exe" cd %LOCALBUILPATH%solr\data\core_vendorA\index &sz a -tzip index.zip -mx3 copy-item index.zip \\prodMachineA\new-index\vendorA\index cd \\prodMachineA\new-index\vendorA\index &sz x index.zip 
Do the following steps for each production machine
3. Delete old index for vendor core_vendorA by sending a solr request:
curl "http://localhost:8080/solr/core_vendorA/update?commit=true&stream.body=<delete><query>*:*</query></delete>"

4. Merge index from new-index\core_vendorA\index to core core_vendorA by sending a solr request:
curl "http://localhost:8080/solr/admin/cores?action=mergeindexes&core=core_vendorA&indexDir=%PREFIX%\new-index\core_vendorA\index"

5. Commit the merged index by sending a commit request to solr: 
curl "http://localhost:8080/solr/core_vendorA/update?commit=true"

Step3,4,5 is pretty fast, usually take less than 1 minute.

Resources
Nutch2: Extend Nutch2 to Crawl via Http API

Commonly Used Windows PowerShell Commands


One reason we like Linux is because it's so easy to complete common (administration) tasks via the shell or scripting.

But sometimes, we have to work on Windows, and not able to install cygwin.
Luckily, Microsoft provides PowerShell, and it's preinstalled with Win7, Windows Server 2008 R2 and later Windows release.

Power Shell is cool and useful, it's different from Linux's Shell, as it's completely object-oriented.
Common Folder/File Operations
Create a folder
mkdir c:\f1\f2\f3
md c:\f1\f2\f3
New-Item c:\f1\f2\f3 -ItemType directory
rm -r c:\f1\f2\f3
Create a file
New-Item c:\f1\f2\f3 -ItemType file -force -value "hello world"
cat c:\f1\f2\f3

Delete Files
Remove-Item -Recurse -Force .\incubator-blur #like linux rm -rf
Remove-Item c:\scripts\* -include .txt -exclude *test
Extract lines from files
Get first 10 lines as head -10 in linux
Get-Content -Path my.csv -TotalCount 10
Get last 10 lines as tail -10 in Linux
Get-Content -Path my.csv | Select-Object -Last 10
Get-Content -Path my.csv | Select-Object -Index(10)
Get the 10th to 100th lines
Get-Content -Path my.csv | Select-Object -Index(10..100)
Get 10th and 100th lines
Get-Content -Path my.csv | Select-Object -Index(10, 100)
Search recursively for a certain string within files
Get-ChildItem -Recurse -Filter *.log | Select-String Exception
Get-ChildItem -Recurse -Filter *.log | Select-String -CaseSensitive -Pattern Exception

Tail -f in PowerShell

In powershell 3.0 and newer version, powershel supports: -Tail:
Get-Content error.log -Tail 10 -Wait
Get-Content error.log -wait
Get-Content error.log -wait | Where-Object { $_ -match "Exception" } 
-match is case-insensitive. -cmath is case-sensitive.

List All Java Files in All Subfolders
gci -Recurse -filter *.java | % { $_.FullName }

Select-String
(select-string -path audit.log -pattern "logon failed").count
Select-String C:\Scripts\Test.lxt -pattern "failure" -context 3,1

Measure-Object
Display the number of characters, words, and lines in the Text.txt file.
get-content C:\test.txt | measure-object -character -line -word
get-childitem | measure-object -property length -minimum -maximum -average
import-csv d:\test\serviceyrs.csv | measure-object -property years -minimum -maximum -average

Find the five processes using the most memory
Get-Process | Sort-Object -Property WS -Descending | Select-Object -First 10

Delete all files within a directory

Remove-Item foldername -Recurse

Rename all .TXT files as .LOG files in the current directory:
Get-Childitem -Path *.txt | rename-item -NewName {$_.name -replace ".txt",".log"}

Miscs
Restart-Computer –Force –ComputerName TARGETMACHINE
Run a script on a remote computer
invoke-command -computername machine1, machine2 -filepath c:\Script\script.ps1

Using Get-WmiObject
List all WMI classes:
Get-WmiObject -List
Get-WmiObject -Class Win32_ComputerSystem 
Get-WmiObject -Class Win32_BIOS -ComputerName .
gwmi win32_service -filter "name like 'Oracle%'" | select name 
gwmi win32_service -filter "startmode='auto'" | select name,startmode
(gwmi win32_service -filter "name='alerter'").StopService()

Labels

ANT (6) Algorithm (69) Algorithm Series (35) Android (7) Big Data (7) Blogger (14) Bugs (6) Cache (5) Chrome (19) Code Example (29) Code Quality (7) Coding Skills (5) Database (7) Debug (16) Design (5) Dev Tips (63) Eclipse (32) Git (5) Google (33) Guava (7) How to (9) Http Client (8) IDE (7) Interview (88) J2EE (13) J2SE (49) JSON (7) Java (186) JavaScript (27) Learning code (9) Lesson Learned (6) Linux (26) Lucene-Solr (112) Mac (10) Maven (8) Network (9) Nutch2 (18) Performance (9) PowerShell (11) Problem Solving (11) Programmer Skills (6) Scala (6) Security (9) Soft Skills (38) Spring (22) System Design (11) Testing (7) Text Mining (14) Tips (17) Tools (24) Troubleshooting (29) UIMA (9) Web Development (19) Windows (21) adsense (5) bat (8) regex (5) xml (5)